On 2020-01-14 I run the analysis of Gene Ontology terms. Here I want to focus on part of those results and make some findings clearer. I will use results only from gene annotation, and not from annotation of transcripts, which is a somewhat noiser dataset. I will not look at the functions affected by hatching condition, since it only affects a couple of genes, to be inspected later. Among all cellular functions significantly enriched with genes differentially expressed among selective regimes, I will focus on those that are most significant, and detailed enough. For example, I skip terms like transmembrane transport, proteolysis or protein phosphorylation, because they are too general. Sections below are named for the functional categories that I think are worth discussing. I will try to contextualize the results. My entry point to the literature is (Garcia-Roger et al. 2019).
I need the results of the differential expression analysis (2020-01-08) to identify the genes annotated with the significant GO terms, and to assess the direction of gene expression change between selective regimes.
library(variancePartition)
library(GO.db)
library(ggplot2)
library(tidyr)
library(reactable)
ANNOTATION <- '../2019-07-26/genes/annotation.txt'
EXPRESSION <- '../2020-01-08/genes.RData' # variance partition and the mixed models fitted with dream()
The annotation table includes only the most specific terms that can be assigned to a gene. The more general, ancestor terms are assumed, of course, but not explicitly mentioned in the annotation. For the purpose of retrieving all genes annotated to a term, I need to expand the annotation, to make ancestor terms explicit. It took me a while to realize that I can do this with the unlist() function:
annotation <- read.table(ANNOTATION, col.names = c('gene', 'GOterms'), colClasses = c('character', 'character'))
head(annotation)
## gene GOterms
## 1 XLOC_000002 GO:0008417|GO:0016020|GO:0006486
## 2 XLOC_000009 GO:0043130|GO:0005515|GO:0043161
## 3 XLOC_000010 GO:0005840|GO:0015935|GO:0003735|GO:0006412
## 4 XLOC_000015 GO:0006355|GO:0003700|GO:0006357|GO:0043565|GO:0005634|GO:0032502
## 5 XLOC_000021 GO:0016567|GO:0006397|GO:0061630|GO:0008270
## 6 XLOC_000036 GO:0005272|GO:0006814|GO:0016020
annotation.list <- strsplit(annotation$GOterms, split='|', fixed=TRUE)
names(annotation.list) <- annotation$gene
# append can only join two vectors at a time. It works with lists!
allAncestors <- append(as.list(GOBPANCESTOR),
append(as.list(GOMFANCESTOR),
as.list(GOCCANCESTOR)))
fullAnnotation <- lapply(annotation.list,
FUN = function(x){
unique(append(x,
unlist(allAncestors[x], use.names=FALSE)))
})
load(EXPRESSION)
ls()
## [1] "allAncestors" "annotation" "ANNOTATION" "annotation.list"
## [5] "EXPRESSION" "fitmm" "fullAnnotation" "varPart"
## [9] "vobjDream"
head(varPart)
## population regime treatment Residuals
## XLOC_000002 0 0.7508718 0.050900336 0.1982279
## XLOC_000007 0 0.1579138 0.002683147 0.8394031
## XLOC_000008 0 0.7439376 0.008721038 0.2473413
## XLOC_000010 0 0.6887574 0.026242757 0.2849999
## XLOC_000015 0 0.6014947 0.080366812 0.3181385
## XLOC_000017 0 0.6576267 0.104331931 0.2380414
It is among the most significantly enriched terms, with only 13 genes annotated.
goterm <- 'GO:0006289'
genes <- names(fullAnnotation[grep(goterm, fullAnnotation, fixed=TRUE)])
# Not all genes annotated have been used in expression analysis
genes <- genes[genes %in% row.names(varPart)]
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'Residuals')])
z2 <- topTable(fitmm, coef='regime', number=length(fitmm$F))
z2 <- z2[row.names(z2) %in% genes,] # still ordered by p-value!
reactable(z2,
columns = list(
logFC = colDef(format=colFormat(digits=3)),
AveExpr = colDef(format=colFormat(digits=3)),
t = colDef(format=colFormat(digits=3)),
P.Value = colDef(format=colFormat(digits=3)),
adj.P.Val = colDef (format=colFormat(digits=3)),
z.std = colDef(format=colFormat(digits=3))
))
z2 <- z2[order(z2$logFC),]
z <- as.data.frame(vobjDream$E) # this is the whole expression matrix
z <- z[row.names(z2),]
z$gene <- factor(row.names(z), levels=row.names(z2), ordered=TRUE)
GE <- pivot_longer(z, cols=c(1:12), names_to='sample', values_to='expression')
GE$regime <- factor('regular', levels=c('regular', 'random'))
GE[GE$sample %in% c("X1A_S8", "X1C_S1", "X2A_S7", "X2C_S5", "X4A_S6", "X4C_S12"), 'regime'] <- 'random'
ggplot(data=GE, mapping=aes(x=regime, y=expression, color=regime)) +
geom_boxplot() + facet_wrap(~gene) + ggtitle(Term(GOTERM[goterm]))
goterm <- 'GO:0007186'
genes <- names(fullAnnotation[grep(goterm, fullAnnotation, fixed=TRUE)])
genes <- genes[genes %in% row.names(varPart)]
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'Residuals')])
z2 <- topTable(fitmm, coef='regime', number=length(fitmm$F))
z2 <- z2[row.names(z2) %in% genes,] # still ordered by p-value!
reactable(z2,
columns = list(
logFC = colDef(format=colFormat(digits=3)),
AveExpr = colDef(format=colFormat(digits=3)),
t = colDef(format=colFormat(digits=3)),
P.Value = colDef(format=colFormat(digits=3)),
adj.P.Val = colDef (format=colFormat(digits=3)),
z.std = colDef(format=colFormat(digits=3))
))
Among the 206 genes in this category, 55 have an adjusted p-value lower or equal to 0.1. There are too many genes annotated with this function to plot all their stratified expression levels. I select the most differentially expressed ones.
z2 <- z2[z2$adj.P.Val < 0.1,]
z2 <- z2[order(z2$logFC),]
z <- as.data.frame(vobjDream$E) # this is the whole expression matrix
z <- z[row.names(z2),]
z$gene <- factor(row.names(z), levels=row.names(z2), ordered=TRUE)
GE <- pivot_longer(z, cols=c(1:12), names_to='sample', values_to='expression')
GE$regime <- factor('regular', levels=c('regular', 'random'))
GE[GE$sample %in% c("X1A_S8", "X1C_S1", "X2A_S7", "X2C_S5", "X4A_S6", "X4C_S12"), 'regime'] <- 'random'
ggplot(data=GE, mapping=aes(x=regime, y=expression, color=regime)) +
geom_boxplot() + facet_wrap(~gene) + ggtitle(Term(GOTERM[goterm]))
Among the top 206 genes analysed from the G protein-coupled receptor signaling pathway, 173 have a reduced expression level in the unpredictable environment, and 33 show an increased expression level.
goterm <- 'GO:0007160'
genes <- names(fullAnnotation[grep(goterm, fullAnnotation, fixed=TRUE)])
genes <- genes[genes %in% row.names(varPart)]
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'Residuals')])
z2 <- topTable(fitmm, coef='regime', number=length(fitmm$F))
z2 <- z2[row.names(z2) %in% genes,] # still ordered by p-value!
reactable(z2,
columns = list(
logFC = colDef(format=colFormat(digits=3)),
AveExpr = colDef(format=colFormat(digits=3)),
t = colDef(format=colFormat(digits=3)),
P.Value = colDef(format=colFormat(digits=3)),
adj.P.Val = colDef (format=colFormat(digits=3)),
z.std = colDef(format=colFormat(digits=3))
))
z2 <- z2[order(z2$logFC),]
z <- as.data.frame(vobjDream$E) # this is the whole expression matrix
z <- z[row.names(z2),]
z$gene <- factor(row.names(z), levels=row.names(z2), ordered=TRUE)
GE <- pivot_longer(z, cols=c(1:12), names_to='sample', values_to='expression')
GE$regime <- factor('regular', levels=c('regular', 'random'))
GE[GE$sample %in% c("X1A_S8", "X1C_S1", "X2A_S7", "X2C_S5", "X4A_S6", "X4C_S12"), 'regime'] <- 'random'
ggplot(data=GE, mapping=aes(x=regime, y=expression, color=regime)) +
geom_boxplot() + facet_wrap(~gene) + ggtitle(Term(GOTERM[goterm]))
Among the top 16 genes involved in cell-matrix adhesion, 14 have a reduced expression level in the unpredictable environment, and 2 show an increased expression level.
Several functions related to ion transport are significant. I wonder to what extent their significance is not dependent on a common subset of genes.
goterm_K <- 'GO:0006813'
goterm_COOH <- 'GO:0046942'
goterm_Sulf <- 'GO:0008272'
goterm_Nucl <- 'GO:1901642'
goterm_TM <- 'GO:0034220'
genes_K <- names(fullAnnotation[grep(goterm_K, fullAnnotation, fixed=TRUE)])
genes_COOH <- names(fullAnnotation[grep(goterm_COOH, fullAnnotation, fixed=TRUE)])
genes_Sulf <- names(fullAnnotation[grep(goterm_Sulf, fullAnnotation, fixed=TRUE)])
genes_Nucl <- names(fullAnnotation[grep(goterm_Nucl, fullAnnotation, fixed=TRUE)])
genes_TM <- names(fullAnnotation[grep(goterm_TM, fullAnnotation, fixed=TRUE)])
vennDiagram(vennCounts(cbind(Potassium = row.names(varPart) %in% genes_K,
Carboxilic = row.names(varPart) %in% genes_COOH,
Sulfate = row.names(varPart) %in% genes_Sulf,
Nucleoside = row.names(varPart) %in% genes_Nucl,
Transmembrane = row.names(varPart) %in% genes_TM)))
There is hardly any overlap among genes involved in the transport of these substances. The only remarkable overlap happens between potassium ion transport and ion transmembrane transport. But, still the significance of each term should be quite independent of the significance of any other term. This was expected, because the analysis of functional enrichment applied algorithms to account for the dependence structure among GO terms. I analyse them separately, below.
goterm <- goterm_K
genes <- genes_K # defined above
genes <- genes[genes %in% row.names(varPart)]
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'Residuals')])
#plotPercentBars(varPart[genes,])
z2 <- topTable(fitmm, coef='regime', number=length(fitmm$F))
z2 <- z2[row.names(z2) %in% genes,] # still ordered by p-value!
reactable(z2,
columns = list(
logFC = colDef(format=colFormat(digits=3)),
AveExpr = colDef(format=colFormat(digits=3)),
t = colDef(format=colFormat(digits=3)),
P.Value = colDef(format=colFormat(digits=3)),
adj.P.Val = colDef (format=colFormat(digits=3)),
z.std = colDef(format=colFormat(digits=3))
))
z2 <- z2[order(z2$logFC),]
z <- as.data.frame(vobjDream$E) # this is the whole expression matrix
z <- z[row.names(z2),]
z$gene <- factor(row.names(z), levels=row.names(z2), ordered=TRUE)
GE <- pivot_longer(z, cols=c(1:12), names_to='sample', values_to='expression')
GE$regime <- factor('regular', levels=c('regular', 'random'))
GE[GE$sample %in% c("X1A_S8", "X1C_S1", "X2A_S7", "X2C_S5", "X4A_S6", "X4C_S12"), 'regime'] <- 'random'
ggplot(data=GE, mapping=aes(x=regime, y=expression, color=regime)) +
geom_boxplot() + facet_wrap(~gene) + ggtitle(Term(GOTERM[goterm]))
goterm <- goterm_TM
genes <- genes_TM # defined above.
genes <- genes[genes %in% row.names(varPart)]
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'Residuals')])
z2 <- topTable(fitmm, coef='regime', number=length(fitmm$F))
z2 <- z2[row.names(z2) %in% genes,] # still ordered by p-value!
reactable(z2,
columns = list(
logFC = colDef(format=colFormat(digits=3)),
AveExpr = colDef(format=colFormat(digits=3)),
t = colDef(format=colFormat(digits=3)),
P.Value = colDef(format=colFormat(digits=3)),
adj.P.Val = colDef (format=colFormat(digits=3)),
z.std = colDef(format=colFormat(digits=3))
))
z2 <- z2[z2$adj.P.Val <= 0.1,]
z2 <- z2[order(z2$logFC),]
z <- as.data.frame(vobjDream$E) # this is the whole expression matrix
z <- z[row.names(z2),]
z$gene <- factor(row.names(z), levels=row.names(z2), ordered=TRUE)
GE <- pivot_longer(z, cols=c(1:12), names_to='sample', values_to='expression')
GE$regime <- factor('regular', levels=c('regular', 'random'))
GE[GE$sample %in% c("X1A_S8", "X1C_S1", "X2A_S7", "X2C_S5", "X4A_S6", "X4C_S12"), 'regime'] <- 'random'
ggplot(data=GE, mapping=aes(x=regime, y=expression, color=regime)) + geom_boxplot() +
facet_wrap(~gene) + ggtitle(Term(GOTERM[goterm]))
goterm <- goterm_Nucl
genes <- genes_Nucl
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'Residuals')])
z2 <- topTable(fitmm, coef='regime', number=length(fitmm$F))
z2 <- z2[row.names(z2) %in% genes,] # still ordered by p-value!
reactable(z2,
columns = list(
logFC = colDef(format=colFormat(digits=3)),
AveExpr = colDef(format=colFormat(digits=3)),
t = colDef(format=colFormat(digits=3)),
P.Value = colDef(format=colFormat(digits=3)),
adj.P.Val = colDef (format=colFormat(digits=3)),
z.std = colDef(format=colFormat(digits=3))
))
z2 <- z2[order(z2$logFC),]
z <- as.data.frame(vobjDream$E) # this is the whole expression matrix
z <- z[row.names(z2),]
z$gene <- factor(row.names(z), levels=row.names(z2), ordered=TRUE)
GE <- pivot_longer(z, cols=c(1:12), names_to='sample', values_to='expression')
GE$regime <- factor('regular', levels=c('regular', 'random'))
GE[GE$sample %in% c("X1A_S8", "X1C_S1", "X2A_S7", "X2C_S5", "X4A_S6", "X4C_S12"), 'regime'] <- 'random'
ggplot(data=GE, mapping=aes(x=regime, y=expression, color=regime)) + geom_boxplot() +
facet_wrap(~gene) + ggtitle(Term(GOTERM[goterm]))
goterm <- goterm_COOH
genes <- genes_COOH
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'Residuals')])
z2 <- topTable(fitmm, coef='regime', number=length(fitmm$F))
z2 <- z2[row.names(z2) %in% genes,] # still ordered by p-value!
reactable(z2,
columns = list(
logFC = colDef(format=colFormat(digits=3)),
AveExpr = colDef(format=colFormat(digits=3)),
t = colDef(format=colFormat(digits=3)),
P.Value = colDef(format=colFormat(digits=3)),
adj.P.Val = colDef (format=colFormat(digits=3)),
z.std = colDef(format=colFormat(digits=3))
))
# I wanted the table to be ordered by FDR, but I need it now ordered by logFC:
z2 <- z2[order(z2$logFC),]
z <- as.data.frame(vobjDream$E) # this is the whole expression matrix
z <- z[row.names(z2),]
z$gene <- factor(row.names(z), levels=row.names(z2), ordered=TRUE)
GE <- pivot_longer(z, cols=c(1:12), names_to='sample', values_to='expression')
GE$regime <- factor('regular', levels=c('regular', 'random'))
GE[GE$sample %in% c("X1A_S8", "X1C_S1", "X2A_S7", "X2C_S5", "X4A_S6", "X4C_S12"), 'regime'] <- 'random'
ggplot(data=GE, mapping=aes(x=regime, y=expression, color=regime)) +
geom_boxplot() + facet_wrap(~gene) + ggtitle(Term(GOTERM[goterm]))
goterm <- 'GO:0005992'
genes <- names(fullAnnotation[grep(goterm, fullAnnotation, fixed=TRUE)])
genes <- genes[genes %in% row.names(varPart)]
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'Residuals')])
z2 <- topTable(fitmm, coef='regime', number=length(fitmm$F))
z2 <- z2[row.names(z2) %in% genes,] # still ordered by p-value!
reactable(z2,
columns = list(
logFC = colDef(format=colFormat(digits=3)),
AveExpr = colDef(format=colFormat(digits=3)),
t = colDef(format=colFormat(digits=3)),
P.Value = colDef(format=colFormat(digits=3)),
adj.P.Val = colDef (format=colFormat(digits=3)),
z.std = colDef(format=colFormat(digits=3))
))
z2 <- z2[order(z2$logFC),]
z <- as.data.frame(vobjDream$E) # this is the whole expression matrix
z <- z[row.names(z2),]
z$gene <- factor(row.names(z), levels=row.names(z2), ordered=TRUE)
GE <- pivot_longer(z, cols=c(1:12), names_to='sample', values_to='expression')
GE$regime <- factor('regular', levels=c('regular', 'random'))
GE[GE$sample %in% c("X1A_S8", "X1C_S1", "X2A_S7", "X2C_S5", "X4A_S6", "X4C_S12"), 'regime'] <- 'random'
ggplot(data=GE, mapping=aes(x=regime, y=expression, color=regime)) + geom_boxplot() +
facet_wrap(~gene) + ggtitle(Term(GOTERM[goterm]))
goterm_cellmoti <- 'GO:0048870' # cell motility
goterm_movement <- 'GO:0003341' # cillium movement
goterm_assembly <- 'GO:0060271' # cillium assembly
genes_cellmoti <- names(fullAnnotation[grep(goterm_cellmoti, fullAnnotation, fixed=TRUE)])
genes_movement <- names(fullAnnotation[grep(goterm_movement, fullAnnotation, fixed=TRUE)])
genes_assembly <- names(fullAnnotation[grep(goterm_assembly, fullAnnotation, fixed=TRUE)])
vennDiagram(vennCounts(cbind('Cell motility' = row.names(varPart) %in% genes_cellmoti,
'Cillium movement' = row.names(varPart) %in% genes_movement,
'Cillium assembly' = row.names(varPart) %in% genes_assembly)))
goterm <- goterm_cellmoti
genes <- genes_cellmoti
genes <- genes[genes %in% row.names(varPart)]
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'Residuals')])
z2 <- topTable(fitmm, coef='regime', number=length(fitmm$F))
z2 <- z2[row.names(z2) %in% genes,] # still ordered by p-value!
reactable(z2,
columns = list(
logFC = colDef(format=colFormat(digits=3)),
AveExpr = colDef(format=colFormat(digits=3)),
t = colDef(format=colFormat(digits=3)),
P.Value = colDef(format=colFormat(digits=3)),
adj.P.Val = colDef (format=colFormat(digits=3)),
z.std = colDef(format=colFormat(digits=3))
))
z2 <- z2[order(z2$logFC),]
z <- as.data.frame(vobjDream$E) # this is the whole expression matrix
z <- z[row.names(z2),]
z$gene <- factor(row.names(z), levels=row.names(z2), ordered=TRUE)
GE <- pivot_longer(z, cols=c(1:12), names_to='sample', values_to='expression')
GE$regime <- factor('regular', levels=c('regular', 'random'))
GE[GE$sample %in% c("X1A_S8", "X1C_S1", "X2A_S7", "X2C_S5", "X4A_S6", "X4C_S12"), 'regime'] <- 'random'
ggplot(data=GE, mapping=aes(x=regime, y=expression, color=regime)) +
geom_boxplot() + facet_wrap(~gene) + ggtitle(Term(GOTERM[goterm]))
goterm <- goterm_movement
genes <- genes_movement
genes <- genes[genes %in% row.names(varPart)]
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'Residuals')])
z2 <- topTable(fitmm, coef='regime', number=length(fitmm$F))
z2 <- z2[row.names(z2) %in% genes,] # still ordered by p-value!
reactable(z2,
columns = list(
logFC = colDef(format=colFormat(digits=3)),
AveExpr = colDef(format=colFormat(digits=3)),
t = colDef(format=colFormat(digits=3)),
P.Value = colDef(format=colFormat(digits=3)),
adj.P.Val = colDef (format=colFormat(digits=3)),
z.std = colDef(format=colFormat(digits=3))
))
z2 <- z2[order(z2$logFC),]
z <- as.data.frame(vobjDream$E) # this is the whole expression matrix
z <- z[row.names(z2),]
z$gene <- factor(row.names(z), levels=row.names(z2), ordered=TRUE)
GE <- pivot_longer(z, cols=c(1:12), names_to='sample', values_to='expression')
GE$regime <- factor('regular', levels=c('regular', 'random'))
GE[GE$sample %in% c("X1A_S8", "X1C_S1", "X2A_S7", "X2C_S5", "X4A_S6", "X4C_S12"), 'regime'] <- 'random'
ggplot(data=GE, mapping=aes(x=regime, y=expression, color=regime)) +
geom_boxplot() + facet_wrap(~gene) + ggtitle(Term(GOTERM[goterm]))
goterm <- goterm_assembly
genes <- genes_assembly
genes <- genes[genes %in% row.names(varPart)]
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'Residuals')])
z2 <- topTable(fitmm, coef='regime', number=length(fitmm$F))
z2 <- z2[row.names(z2) %in% genes,] # still ordered by p-value!
reactable(z2,
columns = list(
logFC = colDef(format=colFormat(digits=3)),
AveExpr = colDef(format=colFormat(digits=3)),
t = colDef(format=colFormat(digits=3)),
P.Value = colDef(format=colFormat(digits=3)),
adj.P.Val = colDef (format=colFormat(digits=3)),
z.std = colDef(format=colFormat(digits=3))
))
z2 <- z2[order(z2$logFC),]
z <- as.data.frame(vobjDream$E) # this is the whole expression matrix
z <- z[row.names(z2),]
z$gene <- factor(row.names(z), levels=row.names(z2), ordered=TRUE)
GE <- pivot_longer(z, cols=c(1:12), names_to='sample', values_to='expression')
GE$regime <- factor('regular', levels=c('regular', 'random'))
GE[GE$sample %in% c("X1A_S8", "X1C_S1", "X2A_S7", "X2C_S5", "X4A_S6", "X4C_S12"), 'regime'] <- 'random'
ggplot(data=GE, mapping=aes(x=regime, y=expression, color=regime)) +
geom_boxplot() + facet_wrap(~gene) + ggtitle(Term(GOTERM[goterm]))
goterm <- 'GO:0043161'
genes <- names(fullAnnotation[grep(goterm, fullAnnotation, fixed=TRUE)])
genes <- genes[genes %in% row.names(varPart)]
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'Residuals')])
z2 <- topTable(fitmm, coef='regime', number=length(fitmm$F))
z2 <- z2[row.names(z2) %in% genes,] # still ordered by p-value!
reactable(z2,
columns = list(
logFC = colDef(format=colFormat(digits=3)),
AveExpr = colDef(format=colFormat(digits=3)),
t = colDef(format=colFormat(digits=3)),
P.Value = colDef(format=colFormat(digits=3)),
adj.P.Val = colDef (format=colFormat(digits=3)),
z.std = colDef(format=colFormat(digits=3))
))
z2 <- z2[order(z2$logFC),]
z <- as.data.frame(vobjDream$E) # this is the whole expression matrix
z <- z[row.names(z2),]
z$gene <- factor(row.names(z), levels=row.names(z2), ordered=TRUE)
GE <- pivot_longer(z, cols=c(1:12), names_to='sample', values_to='expression')
GE$regime <- factor('regular', levels=c('regular', 'random'))
GE[GE$sample %in% c("X1A_S8", "X1C_S1", "X2A_S7", "X2C_S5", "X4A_S6", "X4C_S12"), 'regime'] <- 'random'
ggplot(data=GE, mapping=aes(x=regime, y=expression, color=regime)) +
geom_boxplot() + facet_wrap(~gene) + ggtitle(Term(GOTERM[goterm]))
goterm <- 'GO:0006979'
genes <- names(fullAnnotation[grep(goterm, fullAnnotation, fixed=TRUE)])
genes <- genes[genes %in% row.names(varPart)]
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])], c('regime', 'treatment', 'population', 'Residuals')])
z2 <- topTable(fitmm, coef='regime', number=length(fitmm$F))
z2 <- z2[row.names(z2) %in% genes,] # still ordered by p-value!
reactable(z2,
columns = list(
logFC = colDef(format=colFormat(digits=3)),
AveExpr = colDef(format=colFormat(digits=3)),
t = colDef(format=colFormat(digits=3)),
P.Value = colDef(format=colFormat(digits=3)),
adj.P.Val = colDef (format=colFormat(digits=3)),
z.std = colDef(format=colFormat(digits=3))
))
z2 <- z2[order(z2$logFC),]
z <- as.data.frame(vobjDream$E) # this is the whole expression matrix
z <- z[row.names(z2),]
z$gene <- factor(row.names(z), levels=row.names(z2), ordered=TRUE)
GE <- pivot_longer(z, cols=c(1:12), names_to='sample', values_to='expression')
GE$regime <- factor('regular', levels=c('regular', 'random'))
GE[GE$sample %in% c("X1A_S8", "X1C_S1", "X2A_S7", "X2C_S5", "X4A_S6", "X4C_S12"), 'regime'] <- 'random'
ggplot(data=GE, mapping=aes(x=regime, y=expression, color=regime)) +
geom_boxplot() + facet_wrap(~gene) + ggtitle(Term(GOTERM[goterm]))
goterm <- 'GO:0003774'
genes <- names(fullAnnotation[grep(goterm, fullAnnotation, fixed=TRUE)])
genes <- genes[genes %in% row.names(varPart)]
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'Residuals')])
z2 <- topTable(fitmm, coef='regime', number=length(fitmm$F))
z2 <- z2[row.names(z2) %in% genes,] # still ordered by p-value!
reactable(z2,
columns = list(
logFC = colDef(format=colFormat(digits=3)),
AveExpr = colDef(format=colFormat(digits=3)),
t = colDef(format=colFormat(digits=3)),
P.Value = colDef(format=colFormat(digits=3)),
adj.P.Val = colDef (format=colFormat(digits=3)),
z.std = colDef(format=colFormat(digits=3))
))
# I limit the number of genes.
z2 <- z2[z2$adj.P.Val <= 0.1,]
z2 <- z2[order(z2$logFC),]
z <- as.data.frame(vobjDream$E) # this is the whole expression matrix
z <- z[row.names(z2),]
z$gene <- factor(row.names(z), levels=row.names(z2), ordered=TRUE)
GE <- pivot_longer(z, cols=c(1:12), names_to='sample', values_to='expression')
GE$regime <- factor('regular', levels=c('regular', 'random'))
GE[GE$sample %in% c("X1A_S8", "X1C_S1", "X2A_S7", "X2C_S5", "X4A_S6", "X4C_S12"), 'regime'] <- 'random'
ggplot(data=GE, mapping=aes(x=regime, y=expression, color=regime)) +
geom_boxplot() + facet_wrap(~gene) + ggtitle(Term(GOTERM[goterm]))
goterm_metallocarboxy <- 'GO:0004181'
goterm_metalloendopep <- 'GO:0004222'
goterm_serinetypeendo <- 'GO:0004252'
goterm_calciumcystein <- 'GO:0004198'
genes_metallocarboxy <- names(fullAnnotation[grep(goterm_metallocarboxy, fullAnnotation, fixed=TRUE)])
genes_metalloendopep <- names(fullAnnotation[grep(goterm_metalloendopep, fullAnnotation, fixed=TRUE)])
genes_serinetypeendo <- names(fullAnnotation[grep(goterm_serinetypeendo, fullAnnotation, fixed=TRUE)])
genes_calciumcystein <- names(fullAnnotation[grep(goterm_calciumcystein, fullAnnotation, fixed=TRUE)])
vennDiagram(vennCounts(cbind(Metallocarboxy. = row.names(varPart) %in% genes_metallocarboxy,
Metalloendo. = row.names(varPart) %in% genes_metalloendopep,
Serine_type = row.names(varPart) %in% genes_serinetypeendo,
Cystein_type = row.names(varPart) %in% genes_calciumcystein)))
goterm <- 'GO:0004181'
Term(GOTERM[goterm])
## GO:0004181
## "metallocarboxypeptidase activity"
genes <- names(fullAnnotation[grep(goterm, fullAnnotation, fixed=TRUE)])
genes <- genes[genes %in% row.names(varPart)]
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'Residuals')])
z2 <- topTable(fitmm, coef='regime', number=length(fitmm$F))
z2 <- z2[row.names(z2) %in% genes,] # still ordered by p-value!
reactable(z2,
columns = list(
logFC = colDef(format=colFormat(digits=3)),
AveExpr = colDef(format=colFormat(digits=3)),
t = colDef(format=colFormat(digits=3)),
P.Value = colDef(format=colFormat(digits=3)),
adj.P.Val = colDef (format=colFormat(digits=3)),
z.std = colDef(format=colFormat(digits=3))
))
z2 <- z2[order(z2$logFC),]
z <- as.data.frame(vobjDream$E) # this is the whole expression matrix
z <- z[row.names(z2),]
z$gene <- factor(row.names(z), levels=row.names(z2), ordered=TRUE)
GE <- pivot_longer(z, cols=c(1:12), names_to='sample', values_to='expression')
GE$regime <- factor('regular', levels=c('regular', 'random'))
GE[GE$sample %in% c("X1A_S8", "X1C_S1", "X2A_S7", "X2C_S5", "X4A_S6", "X4C_S12"), 'regime'] <- 'random'
ggplot(data=GE, mapping=aes(x=regime, y=expression, color=regime)) +
geom_boxplot() + facet_wrap(~gene) + ggtitle(Term(GOTERM[goterm]))
goterm <- 'GO:0004252'
Term(GOTERM[goterm])
## GO:0004252
## "serine-type endopeptidase activity"
genes <- names(fullAnnotation[grep(goterm, fullAnnotation, fixed=TRUE)])
genes <- genes[genes %in% row.names(varPart)]
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'Residuals')])
z2 <- topTable(fitmm, coef='regime', number=length(fitmm$F))
z2 <- z2[row.names(z2) %in% genes,] # still ordered by p-value!
reactable(z2,
columns = list(
logFC = colDef(format=colFormat(digits=3)),
AveExpr = colDef(format=colFormat(digits=3)),
t = colDef(format=colFormat(digits=3)),
P.Value = colDef(format=colFormat(digits=3)),
adj.P.Val = colDef (format=colFormat(digits=3)),
z.std = colDef(format=colFormat(digits=3))
))
z2 <- z2[z2$adj.P.Val <= 0.1,] # only significantly regulated genes
z2 <- z2[order(z2$logFC),]
z <- as.data.frame(vobjDream$E) # this is the whole expression matrix
z <- z[row.names(z2),]
z$gene <- factor(row.names(z), levels=row.names(z2), ordered=TRUE)
GE <- pivot_longer(z, cols=c(1:12), names_to='sample', values_to='expression')
GE$regime <- factor('regular', levels=c('regular', 'random'))
GE[GE$sample %in% c("X1A_S8", "X1C_S1", "X2A_S7", "X2C_S5", "X4A_S6", "X4C_S12"), 'regime'] <- 'random'
ggplot(data=GE, mapping=aes(x=regime, y=expression, color=regime)) +
geom_boxplot() + facet_wrap(~gene) + ggtitle(Term(GOTERM[goterm]))
goterm <- 'GO:0004198'
Term(GOTERM[goterm])
## GO:0004198
## "calcium-dependent cysteine-type endopeptidase activity"
genes <- names(fullAnnotation[grep(goterm, fullAnnotation, fixed=TRUE)])
genes <- genes[genes %in% row.names(varPart)]
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'Residuals')])
z2 <- topTable(fitmm, coef='regime', number=length(fitmm$F))
z2 <- z2[row.names(z2) %in% genes,] # still ordered by p-value!
reactable(z2,
columns = list(
logFC = colDef(format=colFormat(digits=3)),
AveExpr = colDef(format=colFormat(digits=3)),
t = colDef(format=colFormat(digits=3)),
P.Value = colDef(format=colFormat(digits=3)),
adj.P.Val = colDef (format=colFormat(digits=3)),
z.std = colDef(format=colFormat(digits=3))
))
z2 <- z2[z2$adj.P.Val <= 0.13,] # only significantly regulated genes
z2 <- z2[order(z2$logFC),]
z <- as.data.frame(vobjDream$E) # this is the whole expression matrix
z <- z[row.names(z2),]
z$gene <- factor(row.names(z), levels=row.names(z2), ordered=TRUE)
GE <- pivot_longer(z, cols=c(1:12), names_to='sample', values_to='expression')
GE$regime <- factor('regular', levels=c('regular', 'random'))
GE[GE$sample %in% c("X1A_S8", "X1C_S1", "X2A_S7", "X2C_S5", "X4A_S6", "X4C_S12"), 'regime'] <- 'random'
ggplot(data=GE, mapping=aes(x=regime, y=expression, color=regime)) +
geom_boxplot() + facet_wrap(~gene) + ggtitle(Term(GOTERM[goterm]))
goterm <- 'GO:0004222'
Term(GOTERM[goterm])
## GO:0004222
## "metalloendopeptidase activity"
genes <- names(fullAnnotation[grep(goterm, fullAnnotation, fixed=TRUE)])
genes <- genes[genes %in% row.names(varPart)]
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'Residuals')])
z2 <- topTable(fitmm, coef='regime', number=length(fitmm$F))
z2 <- z2[row.names(z2) %in% genes,] # still ordered by p-value!
reactable(z2,
columns = list(
logFC = colDef(format=colFormat(digits=3)),
AveExpr = colDef(format=colFormat(digits=3)),
t = colDef(format=colFormat(digits=3)),
P.Value = colDef(format=colFormat(digits=3)),
adj.P.Val = colDef (format=colFormat(digits=3)),
z.std = colDef(format=colFormat(digits=3))
))
z2 <- z2[z2$adj.P.Val <= 0.11,] # only significantly regulated genes
z2 <- z2[order(z2$logFC),]
z <- as.data.frame(vobjDream$E) # this is the whole expression matrix
z <- z[row.names(z2),]
z$gene <- factor(row.names(z), levels=row.names(z2), ordered=TRUE)
GE <- pivot_longer(z, cols=c(1:12), names_to='sample', values_to='expression')
GE$regime <- factor('regular', levels=c('regular', 'random'))
GE[GE$sample %in% c("X1A_S8", "X1C_S1", "X2A_S7", "X2C_S5", "X4A_S6", "X4C_S12"), 'regime'] <- 'random'
ggplot(data=GE, mapping=aes(x=regime, y=expression, color=regime)) +
geom_boxplot() + facet_wrap(~gene) + ggtitle(Term(GOTERM[goterm]))
goterm <- 'GO:0016715'
Term(GOTERM[goterm])
## GO:0016715
## "oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, reduced ascorbate as one donor, and incorporation of one atom of oxygen"
genes <- names(fullAnnotation[grep(goterm, fullAnnotation, fixed=TRUE)])
genes <- genes[genes %in% row.names(varPart)]
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'Residuals')])
z2 <- topTable(fitmm, coef='regime', number=length(fitmm$F))
z2 <- z2[row.names(z2) %in% genes,] # still ordered by p-value!
reactable(z2,
columns = list(
logFC = colDef(format=colFormat(digits=3)),
AveExpr = colDef(format=colFormat(digits=3)),
t = colDef(format=colFormat(digits=3)),
P.Value = colDef(format=colFormat(digits=3)),
adj.P.Val = colDef (format=colFormat(digits=3)),
z.std = colDef(format=colFormat(digits=3))
))
#z2 <- z2[z2$adj.P.Val <= 0.11,] # only significantly regulated genes
z2 <- z2[order(z2$logFC),]
z <- as.data.frame(vobjDream$E) # this is the whole expression matrix
z <- z[row.names(z2),]
z$gene <- factor(row.names(z), levels=row.names(z2), ordered=TRUE)
GE <- pivot_longer(z, cols=c(1:12), names_to='sample', values_to='expression')
GE$regime <- factor('regular', levels=c('regular', 'random'))
GE[GE$sample %in% c("X1A_S8", "X1C_S1", "X2A_S7", "X2C_S5", "X4A_S6", "X4C_S12"), 'regime'] <- 'random'
ggplot(data=GE, mapping=aes(x=regime, y=expression, color=regime)) +
geom_boxplot() + facet_wrap(~gene) + ggtitle(Term(GOTERM[goterm]))
sessionInfo()
## R version 3.6.2 (2019-12-12)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
##
## locale:
## [1] LC_CTYPE=ca_ES.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=ca_ES.UTF-8 LC_COLLATE=ca_ES.UTF-8
## [5] LC_MONETARY=ca_ES.UTF-8 LC_MESSAGES=ca_ES.UTF-8
## [7] LC_PAPER=ca_ES.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=ca_ES.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] reactable_0.1.0 tidyr_1.0.0 GO.db_3.10.0
## [4] AnnotationDbi_1.48.0 IRanges_2.20.2 S4Vectors_0.24.3
## [7] variancePartition_1.16.1 Biobase_2.46.0 BiocGenerics_0.32.0
## [10] scales_1.1.0 foreach_1.4.7 limma_3.42.0
## [13] ggplot2_3.2.1
##
## loaded via a namespace (and not attached):
## [1] jsonlite_1.6 bit64_0.9-7 splines_3.6.2
## [4] gtools_3.8.1 assertthat_0.2.1 blob_1.2.1
## [7] yaml_2.2.0 progress_1.2.2 pillar_1.4.3
## [10] RSQLite_2.2.0 backports_1.1.5 lattice_0.20-38
## [13] glue_1.3.1 digest_0.6.23 minqa_1.2.4
## [16] colorspace_1.4-1 htmltools_0.4.0 Matrix_1.2-18
## [19] plyr_1.8.5 reactR_0.4.2 pkgconfig_2.0.3
## [22] purrr_0.3.3 gdata_2.18.0 lme4_1.1-21
## [25] BiocParallel_1.20.1 tibble_2.1.3 farver_2.0.3
## [28] withr_2.1.2 lazyeval_0.2.2 pbkrtest_0.4-7
## [31] magrittr_1.5 crayon_1.3.4 memoise_1.1.0
## [34] evaluate_0.14 doParallel_1.0.15 nlme_3.1-143
## [37] MASS_7.3-51.5 gplots_3.0.1.2 tools_3.6.2
## [40] prettyunits_1.1.1 hms_0.5.3 lifecycle_0.1.0
## [43] stringr_1.4.0 munsell_0.5.0 colorRamps_2.3
## [46] compiler_3.6.2 caTools_1.18.0 rlang_0.4.2
## [49] grid_3.6.2 nloptr_1.2.1 iterators_1.0.12
## [52] htmlwidgets_1.5.1 labeling_0.3 bitops_1.0-6
## [55] rmarkdown_2.1 boot_1.3-24 gtable_0.3.0
## [58] codetools_0.2-16 DBI_1.1.0 reshape2_1.4.3
## [61] R6_2.4.1 knitr_1.27 dplyr_0.8.3
## [64] bit_1.1-15.1 zeallot_0.1.0 KernSmooth_2.23-16
## [67] stringi_1.4.5 Rcpp_1.0.3 vctrs_0.2.1
## [70] tidyselect_0.2.5 xfun_0.12
Garcia-Roger, Eduardo M., Esther Lubzens, Diego Fontaneto, and Manuel Serra. 2019. “Facing Adversity: Dormant Embryos in Rotifers.” The Biological Bulletin 237 (2): 119–44. https://doi.org/10.1086/705701.